This notebook is made by Laxman Kumar as part of submission for CIS 731 ANN Homework 2.

NOTE : All the computations are made on mac laptop with Intel i6 1.4 GHz processor and 16 GB RAM. Also, there was no GPU involved in the computation, all the computation are made using CPU only.

In [156]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
import sklearn
import numpy as np
from sklearn.metrics import mean_squared_error 
  
from torch.utils import data as T
import torch
from torchviz import make_dot, make_dot_from_trace
from torchsummary import summary
from torch.autograd import Variable
import pandas as pd
import matplotlib.pyplot as plt
import torch.nn as nn
import torch.nn.functional as F
import seaborn as sns
import time
import matplotlib.patches as mpatches
import plotly.io as pio
pio.renderers.default = "notebook"

Dataset description

I had taken Apple stock market data from yahoo finance. Data is from 2002 to the current date.

In [2]:
df = pd.read_csv("GME (1).csv")
In [3]:
df.head()
Out[3]:
Date Open High Low Close Adj Close Volume
0 2002-02-13 9.625 10.060 9.525 10.050 6.766666 19054000
1 2002-02-14 10.175 10.195 9.925 10.000 6.733003 2755400
2 2002-02-15 10.000 10.025 9.850 9.950 6.699336 2097400
3 2002-02-19 9.900 9.900 9.375 9.550 6.430017 1852600
4 2002-02-20 9.600 9.875 9.525 9.875 6.648838 1723200

We are only using closing rate of stock on a particular day as our variable

In [4]:
df = df[['Close']]

Making train and test set

We are creating 3 different set for training, validation and testing. We will be last two year data for testing, 3rd last year for validation and remaining data for training.
In [5]:
train_data = df[:4027]
validation_data = df[4027:4250]
test_data = df[4250:]

Using MinMax scaler to scale our training and testing data

In [6]:
#initializing scaler object
scaler = MinMaxScaler()
#fitting our training data into scaler object
scaler = scaler.fit(train_data)

#transforming training, testing and validation data using scaler object
train_data = scaler.transform(train_data)
test_data = scaler.transform(test_data)
validation_data = scaler.transform(validation_data)

Below function creates sequence based on the sequence length provided as the parameters. It will create a new dataset with feature as the number of days of data as the sequence and the target variable as the following day data. Suppose sequence is 3, then the first 3 data will be our X or features and the 4th day data will be our Y or the target variable.

In [7]:
def create_sequences(data, seq_length):
    xs = []
    ys = []
    for i in range(len(data)-seq_length-1):
        x = data[i:(i+seq_length)]
        y = data[i+seq_length]
        xs.append(x)
        ys.append(y)
    return np.array(xs), np.array(ys)
In [8]:
seq_length = 4

#creating sequential data for training, testing and validation set
X_train, y_train = create_sequences(train_data, seq_length)
X_test, y_test = create_sequences(test_data, seq_length)
X_val, y_val = create_sequences(validation_data, seq_length)

#converting our features and target for all three set into tensors
X_train = torch.from_numpy(X_train).float()
y_train = torch.from_numpy(y_train).float()

X_test = torch.from_numpy(X_test).float()
y_test = torch.from_numpy(y_test).float()

X_val = torch.from_numpy(X_val).float()
y_val = torch.from_numpy(y_val).float()

Important User Defined Functions

Below function trains a model on the number of epoch as passed in the function parameter. We are using MSE as the loss function for all the predictions and Adam as the optimizer.
In [9]:
def training(epoch,model,train,validation):
    num_epochs = epoch
    
    learning_rate = 0.01
    
    X_train = train[0]
    y_train = train[1]
    
    X_val = validation[0]
    y_val = validation[1]
    #Loss functiom
    criterion = torch.nn.MSELoss()    # mean-squared error for regression
    #Initialization Adam optimizer
    optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
    #optimizer = torch.optim.SGD(lstm.parameters(), lr=learning_rate)
    
    #Initilizing empty list for storing training loss, test loss and computation time
    train_lss = []
    val_loss = []
    computation_time = []
    
    #Starting the timer 
    time_start = time.process_time()
    
    # Train the model
    for epoch in range(num_epochs):
        outputs = model(X_train)
        optimizer.zero_grad()
        # obtain the loss function
        loss = criterion(outputs, y_train)
        with torch.no_grad():
            y_val_pred = model(X_val)
            val_lss = criterion(y_val_pred.float(), y_val)
            val_loss.append(val_lss.item())
        train_lss.append(loss.item())

        loss.backward()
        optimizer.step()

        if epoch % 20 == 0:
            print("Epoch: %d, training loss: %1.5f , validation loss: %1.5f" 
                  % (epoch, loss.item(),val_lss.item()))
    
    #Ending the timer
    computation_time.append((time.process_time() - time_start))
    
    #returning training loss, testing loss and the computation time as the output of the function
    return train_lss,val_loss,computation_time
Below function plots the training and validation loss which is returned by our training function above. The below function plots two plots one with original y lim and the second plot with y axis zoomed in to (0,0.010).
In [10]:
def plot_TrainTestMSE(epoch,train_loss,test_loss):
    
    blue_patch = mpatches.Patch(color='blue', label='Train MSE')
    green_patch = mpatches.Patch(color='orange', label='Validation MSE')
    
    #Using matplotlib subplot to create two subplot and specifying figure size as 14x6
    f, axes = plt.subplots(1, 2,figsize=(14,6))
    
    #plotting the first plot
    #plotting the train loss
    sns.lineplot(x=range(1,epoch+1),y=train_loss,ax=axes[0])
    #plotting the testing loss
    sns.lineplot(x=range(1,epoch+1),y=test_loss,ax=axes[0])
    #specifying the title for first plot
    axes[0].title.set_text('Original Plot')
    #specifying the xlable and ylabel for first plot
    axes[0].set_xlabel("EPOCH")
    axes[0].set_ylabel("MSE")

    #plotting the first plot
    #plotting the train loss
    sns.lineplot(x=range(1,epoch+1),y=train_loss,ax=axes[1])
    #plotting the testing loss
    sns.lineplot(x=range(1,epoch+1),y=test_loss,ax=axes[1])
    #specifying the title for second plot
    axes[1].title.set_text('Modified Y-axis plot')
    #specifying the xylabel for second plot
    axes[1].set_xlabel("EPOCH")
    
    #specifying the limit for y axis
    axes[1].set_ylim(0,0.010)
    
    axes[0].legend(handles=[blue_patch,green_patch])
    axes[1].legend(handles=[blue_patch,green_patch])
    #removing the top border of the plot
    sns.despine(top=True)
    
    #specifying the super title for both the plot
    plt.suptitle("Training and Validation loss");
Function to make prediction. It takes model, and test data as input parameters and return both real target and predicted target variable.
In [11]:
def testPrediction(X_test,y_test,model):

    predictions = model(X_test)
    #converting both the data predicted and real values into numpy for using it in scaler object
    data_real = predictions.data.numpy()
    data_predicted = y_test.data.numpy()
    
    #using scaler object as initilized before to inverse transform both the data (predicted and real values)
    data_predicted = scaler.inverse_transform(data_predicted)
    data_real = scaler.inverse_transform(data_real)
    
    return data_real,data_predicted
Below function plots the real value of our test data and the predicted value from out model which is passed as the parameter.
In [152]:
def plot_test(model,X_test):
    #specifying the figure size
    X_test = X_test
    data_real,data_predict = testPrediction(X_test,y_test,model)
   
    fig = go.Figure()
    
    fig.add_trace(go.Scatter(x=list(range(1,len(X_test)+10)), y=data_real.reshape(441),
                    mode='lines',
                    name='Original Data'))
    fig.add_trace(go.Scatter(x=list(range(1,len(X_test)+10)), y=data_predict.reshape(441),
                    mode='lines',
                    name='Predicted Data'))
    fig.update_layout(title_text="Original data points vs predicted data points")
    fig.show()
In [68]:
torch.manual_seed(0)
Out[68]:
<torch._C.Generator at 0x7fa99d5d46f0>

Plain Backpropogation

For the task first which is plain backpropogation we are creating a model with 1 hidden layer and a sigmoid activation function. Since we have created sequence of 4 features, we will have 4 input nodes and our output will be 1 node.
In [69]:
class PB(nn.Module):
    def __init__(self):
        super(PB, self).__init__()
        self.fc1 = nn.Linear(4, 16)
        self.fc2 = nn.Linear(16, 32)
        self.fc3 = nn.Linear(32, 1)

    def forward(self, x):
        x = torch.sigmoid(self.fc1(x))
        x = torch.sigmoid(self.fc2(x))
        x = self.fc3(x)
        
        return x
In [70]:
summary(PB(),input_size=(4022,4))
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Linear-1             [-1, 4022, 16]              80
            Linear-2             [-1, 4022, 32]             544
            Linear-3              [-1, 4022, 1]              33
================================================================
Total params: 657
Trainable params: 657
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.06
Forward/backward pass size (MB): 1.50
Params size (MB): 0.00
Estimated Total Size (MB): 1.57
----------------------------------------------------------------

Above is the summary of our model, it describes how much total memory is required to run the model and how much total size of our mode will be after successful training.

In [71]:
"""
We have to reshape the data for our linear model. Since we will pass one dimensional data, and our data is 
in 2 dimension. So we will use reshape function to re-shape the data.
Example: [[[1],[2],[3]]] -> shape is (1,3,1)

The above data is 3 dimension, where the first dimension indicates total number of rows in our data, second and third
dimension gives us the dimension of the data passed into the model.

So we have to convert (1,3,1) => (1,3) so that we can pass that into our linear model.
"""

X_train1 = X_train.reshape(4022,4)
X_test1 = X_test.reshape(441,4)
X_val1 = X_val.reshape(218,4)
In [72]:
model = PB()
make_dot(model(X_train1), params=dict(model.named_parameters()))
Out[72]:
140366558921632 AddmmBackward 140366558924128 fc3.bias (1) 140366558924128->140366558921632 140366558924176 SigmoidBackward 140366558924176->140366558921632 140366558923600 AddmmBackward 140366558923600->140366558924176 140366558921920 fc2.bias (32) 140366558921920->140366558923600 140366558921728 SigmoidBackward 140366558921728->140366558923600 140366558923984 AddmmBackward 140366558923984->140366558921728 140366558921344 fc1.bias (16) 140366558921344->140366558923984 140366558923360 TBackward 140366558923360->140366558923984 140366558923504 fc1.weight (16, 4) 140366558923504->140366558923360 140366558922976 TBackward 140366558922976->140366558923600 140366558921440 fc2.weight (32, 16) 140366558921440->140366558922976 140366558921488 TBackward 140366558921488->140366558921632 140366558921824 fc3.weight (1, 32) 140366558921824->140366558921488
Above is the flow diagram of our plain backpropogation model and how each layer is linked. As we have see at the top, we have our input layer weights which is passed to Tbackward and then with a bias it is passed to our adam backward function. The result of the adambackward is passed to our activation function. Then the weights of the second layer  along with the result of the activation function and bias are passwed to adam optimizer. This process is repeated for all the layers. Finally our output layer weights along with the result of sigmoid activation and a bias are passed to the adam optimizer.

We are first calling the model class and then called our training function with 300 epoch to train the model with the training data and validation data

In [79]:
pb = PB()
train_loss,test_loss,pb_computations = training(300,pb,[X_train1,y_train],[X_val1,y_val])
Epoch: 0, training loss: 0.11674 , validation loss: 0.00680
Epoch: 20, training loss: 0.04402 , validation loss: 0.01771
Epoch: 40, training loss: 0.02928 , validation loss: 0.02926
Epoch: 60, training loss: 0.00517 , validation loss: 0.00360
Epoch: 80, training loss: 0.00178 , validation loss: 0.00083
Epoch: 100, training loss: 0.00092 , validation loss: 0.00012
Epoch: 120, training loss: 0.00072 , validation loss: 0.00014
Epoch: 140, training loss: 0.00060 , validation loss: 0.00013
Epoch: 160, training loss: 0.00051 , validation loss: 0.00013
Epoch: 180, training loss: 0.00045 , validation loss: 0.00012
Epoch: 200, training loss: 0.00040 , validation loss: 0.00012
Epoch: 220, training loss: 0.00037 , validation loss: 0.00012
Epoch: 240, training loss: 0.00035 , validation loss: 0.00011
Epoch: 260, training loss: 0.00033 , validation loss: 0.00011
Epoch: 280, training loss: 0.00032 , validation loss: 0.00011
In [18]:
pb_computations
Out[18]:
[1.7134310000000004]
Above is the result of our plain backpropogation model training and their training and validation loss. From the data, we can observe that the both training and validation were decreasing till 120 epoch and then the validation loss became constant. It took 1.67 seconds to train the model.
In [19]:
plot_TrainTestMSE(300,train_loss,test_loss)
Above plot is plotted between number of epoch and MSE. As we can see from both the plots that the loss decreased heavingly in the first 50 epoch and later it started decreasing slowly. After the first 150 epoch, both the MSE training and validation decreasing very slowly and almost reached 0.001.
In [22]:
data_real,data_predict = testPrediction(X_test1,y_test,pb)
pb_mse = mean_squared_error(data_real,data_predict) 
pb_mse
Out[22]:
0.30662194
For the plain backpropogation we got mean squared value of 0.2706.
In [157]:
plot_test(pb,X_test1)
Above is the line graph of the original data points and our predicited data points. Our linear model was able to perform well, since it was able to capture the trend of the data.

Backpropogation through time

We will be using a RNN layer followed a linear layer and a output layer. torch.RNN have two inputs, the input data and the hidden state. Details of the inputs are mentioned below

input of shape (seq_len, batch, input_size): tensor containing the features of the input sequence. The input can also be a packed variable length sequence. See torch.nn.utils.rnn.pack_padded_sequence() or torch.nn.utils.rnn.pack_sequence() for details.

h_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. If the RNN is bidirectional, num_directions should be 2, else it should be 1.
In [24]:
class BPTT(nn.Module):
    def __init__(self, num_classes, input_size, hidden_size, num_layers):
        super(BPTT, self).__init__()
        
        self.num_classes = num_classes
        self.num_layers = num_layers
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.seq_length = seq_length
        
        self.rnn = nn.RNN(input_size=input_size, hidden_size=hidden_size,
                            num_layers=num_layers, batch_first=True)
        
        self.fc1 = nn.Linear(hidden_size, 32)
        self.fc2 = nn.Linear(32, num_classes)

    def forward(self, x):
        #we are defining our hidden state of our lstm initially
        h_0 = Variable(torch.zeros(
            self.num_layers, x.size(0), self.hidden_size))
        #defining cell state of our lstm layer
        
        ula, h_out = self.rnn(x, h_0)
        h_out = h_out.view(-1, self.hidden_size)
        
        out = torch.relu(self.fc1(h_out))
        out = self.fc2(out)
        
        return out
In [25]:
model = BPTT(1,1,2,1)
make_dot(model(X_train), params=dict(model.named_parameters()))
Out[25]:
140366519839472 AddmmBackward 140366519840528 fc2.bias (1) 140366519840528->140366519839472 140366526387824 ReluBackward0 140366526387824->140366519839472 140366526387056 AddmmBackward 140366526387056->140366526387824 140366526362384 fc1.bias (32) 140366526362384->140366526387056 140366526365504 ViewBackward 140366526365504->140366526387056 140366526362672 StackBackward 140366526362672->140366526365504 140366526363824 TanhBackward 140366526363824->140366526362672 140366526363680 AddBackward0 140366526363680->140366526363824 140366526364064 AddmmBackward 140366526364064->140366526363680 140366526365168 rnn.bias_hh_l0 (2) 140366526365168->140366526364064 140366526364976 AddmmBackward 140366526365168->140366526364976 140366404186416 AddmmBackward 140366526365168->140366404186416 140366526329760 AddmmBackward 140366526365168->140366526329760 140366526365264 TanhBackward 140366526365264->140366526364064 140366526365072 AddBackward0 140366526365072->140366526365264 140366526364976->140366526365072 140366526362816 TanhBackward 140366526362816->140366526364976 140366526365648 AddBackward0 140366526365648->140366526362816 140366404186416->140366526365648 140366404187328 TanhBackward 140366404187328->140366404186416 140366404187232 AddBackward0 140366404187232->140366404187328 140366526329760->140366404187232 140366526330432 TBackward 140366526330432->140366526329760 140366526330624 rnn.weight_hh_l0 (2, 2) 140366526330624->140366526330432 140366404186944 TBackward 140366526330624->140366404186944 140366526364112 TBackward 140366526330624->140366526364112 140366526365024 TBackward 140366526330624->140366526365024 140366526364928 UnbindBackward 140366526364928->140366526363680 140366526364928->140366526365072 140366526364928->140366526365648 140366526364928->140366404187232 140366526330288 AddBackward0 140366526330288->140366526364928 140366526272752 UnsafeViewBackward 140366526272752->140366526330288 140366526272656 MmBackward 140366526272656->140366526272752 140366526272704 TBackward 140366526272704->140366526272656 140366535085792 rnn.weight_ih_l0 (2, 1) 140366535085792->140366526272704 140366526271888 rnn.bias_ih_l0 (2) 140366526271888->140366526330288 140366404186944->140366404186416 140366526364112->140366526364976 140366526365024->140366526364064 140366526362624 TBackward 140366526362624->140366526387056 140366526365552 fc1.weight (32, 2) 140366526365552->140366526362624 140366526387776 TBackward 140366526387776->140366519839472 140366526365600 fc2.weight (1, 32) 140366526365600->140366526387776

We are first calling the model class and then called our training function with 300 epoch to train the model with the training data and validation data

In [116]:
input_size = 1
hidden_size = 2
num_layers = 1

num_classes = 1

bptt = BPTT(num_classes, input_size, hidden_size, num_layers)

train_loss,test_loss,bptt_computation = training(300,bptt,[X_train,y_train],[X_val,y_val])
Epoch: 0, training loss: 0.35565 , validation loss: 0.13943
Epoch: 20, training loss: 0.04427 , validation loss: 0.00684
Epoch: 40, training loss: 0.00881 , validation loss: 0.00149
Epoch: 60, training loss: 0.00152 , validation loss: 0.00066
Epoch: 80, training loss: 0.00037 , validation loss: 0.00019
Epoch: 100, training loss: 0.00021 , validation loss: 0.00009
Epoch: 120, training loss: 0.00019 , validation loss: 0.00005
Epoch: 140, training loss: 0.00018 , validation loss: 0.00006
Epoch: 160, training loss: 0.00018 , validation loss: 0.00006
Epoch: 180, training loss: 0.00017 , validation loss: 0.00006
Epoch: 200, training loss: 0.00017 , validation loss: 0.00006
Epoch: 220, training loss: 0.00017 , validation loss: 0.00006
Epoch: 240, training loss: 0.00017 , validation loss: 0.00006
Epoch: 260, training loss: 0.00017 , validation loss: 0.00005
Epoch: 280, training loss: 0.00017 , validation loss: 0.00005
In [27]:
bptt_computation
Out[27]:
[3.164975]
Above is the result of our backpropogation through time model, training and their training and validation loss. From the data, we can observe that the both training and validation were decreasing till 120 epoch and then the validation loss became constant. It took 3.13 miliseconds to train the model which is higher than our previous model.
In [28]:
plot_TrainTestMSE(300,train_loss,test_loss)
In [117]:
data_real,data_predict = testPrediction(X_test,y_test,bptt)
bptt_mse = mean_squared_error(data_real,data_predict) 
bptt_mse
Out[117]:
0.28730303
We got 0.177 MSE on our test data which has approved significantly in comparison to the previous model.
In [158]:
plot_test(bptt,X_test)
Above is the line graph of the original data points and our predicited data points. Our linear model was able to perform well, since it was able to capture the trend of the data.

LSTM

LSTM model class

We have defined our LSTM class with input parameters as num of classes, input size, hidden size and number of layers. We are one layer of LSTM followed by a Linear layer.

Following are the inputs of our LSTM layer

input of shape (seq_len, batch, input_size):

tensor containing the features of the input sequence. The input can also be a packed variable length sequence. See torch.nn.utils.rnn.pack_padded_sequence() or torch.nn.utils.rnn.pack_sequence() for details.

h_0 of shape (num_layers * num_directions, batch, hidden_size):

tensor containing the initial hidden state for each element in the batch. If the LSTM is bidirectional, num_directions should be 2, else it should be 1.

c_0 of shape (num_layers * num_directions, batch, hidden_size):

tensor containing the initial cell state for each element in the batch.

If (h_0, c_0) is not provided, both h_0 and c_0 default to zero.
In [31]:
class LSTM(nn.Module):
    def __init__(self, num_classes, input_size, hidden_size, num_layers):
        super(LSTM, self).__init__()
        
        self.num_classes = num_classes
        self.num_layers = num_layers
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.seq_length = seq_length
        
        self.lstm = nn.LSTM(input_size=input_size, hidden_size=hidden_size,
                            num_layers=num_layers, batch_first=True)
        
        self.fc1 = nn.Linear(hidden_size, 64)
        self.fc2 = nn.Linear(64, num_classes)

    def forward(self, x):
        #we are defining our hidden state of our lstm initially
        h_0 = Variable(torch.zeros(
            self.num_layers, x.size(0), self.hidden_size))
        #defining cell state of our lstm layer
        c_0 = Variable(torch.zeros(
            self.num_layers, x.size(0), self.hidden_size))
        
        ula, (h_out, _) = self.lstm(x, (h_0, c_0))
        h_out = h_out.view(-1, self.hidden_size)
        
        out = torch.relu(self.fc1(h_out))
        out = self.fc2(out)
        return out
In [32]:
model = LSTM(1,1,2,2)
make_dot(model(X_train), params=dict(model.named_parameters()))
Out[32]:
140366528679024 AddmmBackward 140366528679072 fc2.bias (1) 140366528679072->140366528679024 140366528678448 ReluBackward0 140366528678448->140366528679024 140366528678496 AddmmBackward 140366528678496->140366528678448 140366528675952 fc1.bias (64) 140366528675952->140366528678496 140366528677008 ViewBackward 140366528677008->140366528678496 140366526928688 StackBackward 140366526928688->140366528677008 140366526928784 MulBackward0 140366526928784->140366526928688 140366527652528 StackBackward 140366526928784->140366527652528 140366526930656 SigmoidBackward 140366526930656->140366526928784 140366526927776 SplitBackward 140366526927776->140366526930656 140366526928640 SigmoidBackward 140366526927776->140366526928640 140366526929264 SigmoidBackward 140366526927776->140366526929264 140366526929408 TanhBackward 140366526927776->140366526929408 140366526930608 AddBackward0 140366526930608->140366526927776 140366526930512 AddmmBackward 140366526930512->140366526930608 140366526929168 lstm.bias_hh_l0 (8) 140366526929168->140366526930512 140366526927200 AddmmBackward 140366526929168->140366526927200 140366526927152 AddmmBackward 140366526929168->140366526927152 140366526928736 AddmmBackward 140366526929168->140366526928736 140366526929792 MulBackward0 140366526929792->140366526930512 140366526929792->140366527652528 140366526927296 SigmoidBackward 140366526927296->140366526929792 140366526927968 SplitBackward 140366526927968->140366526927296 140366526927872 SigmoidBackward 140366526927968->140366526927872 140366526927536 SigmoidBackward 140366526927968->140366526927536 140366526929216 TanhBackward 140366526927968->140366526929216 140366526927008 AddBackward0 140366526927008->140366526927968 140366526927200->140366526927008 140366526929600 MulBackward0 140366526929600->140366526927200 140366526929600->140366527652528 140366526927248 SigmoidBackward 140366526927248->140366526929600 140366526929072 SplitBackward 140366526929072->140366526927248 140366526930272 SigmoidBackward 140366526929072->140366526930272 140366526930800 SigmoidBackward 140366526929072->140366526930800 140366526929696 TanhBackward 140366526929072->140366526929696 140366526928016 AddBackward0 140366526928016->140366526929072 140366526927152->140366526928016 140366526929120 MulBackward0 140366526929120->140366526927152 140366526929120->140366527652528 140366526927584 SigmoidBackward 140366526927584->140366526929120 140366526930368 SplitBackward 140366526930368->140366526927584 140366526927488 SigmoidBackward 140366526930368->140366526927488 140366526926960 SigmoidBackward 140366526930368->140366526926960 140366526927440 TanhBackward 140366526930368->140366526927440 140366526930560 AddBackward0 140366526930560->140366526930368 140366526928736->140366526930560 140366526928976 TBackward 140366526928976->140366526928736 140366526930080 lstm.weight_hh_l0 (8, 2) 140366526930080->140366526928976 140366526928160 TBackward 140366526930080->140366526928160 140366526929648 TBackward 140366526930080->140366526929648 140366526929888 TBackward 140366526930080->140366526929888 140366526930704 UnbindBackward 140366526930704->140366526930608 140366526930704->140366526927008 140366526930704->140366526928016 140366526930704->140366526930560 140366526930320 AddBackward0 140366526930320->140366526930704 140366526930848 UnsafeViewBackward 140366526930848->140366526930320 140366526930128 MmBackward 140366526930128->140366526930848 140366526928496 TBackward 140366526928496->140366526930128 140366526927728 lstm.weight_ih_l0 (8, 1) 140366526927728->140366526928496 140366526929552 lstm.bias_ih_l0 (8) 140366526929552->140366526930320 140366526928544 TanhBackward 140366526928544->140366526929120 140366526927392 AddBackward0 140366526927392->140366526928544 140366526927680 MulBackward0 140366526927392->140366526927680 140366526929504 MulBackward0 140366526929504->140366526927392 140366526927488->140366526929504 140366526929984 MulBackward0 140366526929984->140366526927392 140366526926960->140366526929984 140366526927440->140366526929984 140366526928160->140366526927152 140366526926912 TanhBackward 140366526926912->140366526929600 140366526929024 AddBackward0 140366526929024->140366526926912 140366526928112 MulBackward0 140366526929024->140366526928112 140366526927680->140366526929024 140366526930272->140366526927680 140366526928352 MulBackward0 140366526928352->140366526929024 140366526930800->140366526928352 140366526929696->140366526928352 140366526929648->140366526927200 140366526928304 TanhBackward 140366526928304->140366526929792 140366526928832 AddBackward0 140366526928832->140366526928304 140366526930224 MulBackward0 140366526928832->140366526930224 140366526928112->140366526928832 140366526927872->140366526928112 140366526929456 MulBackward0 140366526929456->140366526928832 140366526927536->140366526929456 140366526929216->140366526929456 140366526929888->140366526930512 140366526930464 TanhBackward 140366526930464->140366526928784 140366526930752 AddBackward0 140366526930752->140366526930464 140366526930224->140366526930752 140366526928640->140366526930224 140366526929744 MulBackward0 140366526929744->140366526930752 140366526929264->140366526929744 140366526929408->140366526929744 140366526929312 MulBackward0 140366526929312->140366526928688 140366526930176 SigmoidBackward 140366526930176->140366526929312 140366526927104 SplitBackward 140366526927104->140366526930176 140366527652000 SigmoidBackward 140366526927104->140366527652000 140366527655696 SigmoidBackward 140366526927104->140366527655696 140366527653728 TanhBackward 140366526927104->140366527653728 140366526927824 AddBackward0 140366526927824->140366526927104 140366526927344 AddmmBackward 140366526927344->140366526927824 140366526928928 lstm.bias_hh_l1 (8) 140366526928928->140366526927344 140366527655072 AddmmBackward 140366526928928->140366527655072 140366527653632 AddmmBackward 140366526928928->140366527653632 140366527653824 AddmmBackward 140366526928928->140366527653824 140366526928400 MulBackward0 140366526928400->140366526927344 140366527653776 SigmoidBackward 140366527653776->140366526928400 140366527654016 SplitBackward 140366527654016->140366527653776 140366527653680 SigmoidBackward 140366527654016->140366527653680 140366527655888 SigmoidBackward 140366527654016->140366527655888 140366527652288 TanhBackward 140366527654016->140366527652288 140366527653296 AddBackward0 140366527653296->140366527654016 140366527655072->140366527653296 140366527652864 MulBackward0 140366527652864->140366527655072 140366527655504 SigmoidBackward 140366527655504->140366527652864 140366527654064 SplitBackward 140366527654064->140366527655504 140366527655024 SigmoidBackward 140366527654064->140366527655024 140366527652816 SigmoidBackward 140366527654064->140366527652816 140366527652144 TanhBackward 140366527654064->140366527652144 140366527655840 AddBackward0 140366527655840->140366527654064 140366527653632->140366527655840 140366527655600 MulBackward0 140366527655600->140366527653632 140366527653536 SigmoidBackward 140366527653536->140366527655600 140366527654208 SplitBackward 140366527654208->140366527653536 140366527652720 SigmoidBackward 140366527654208->140366527652720 140366527652672 SigmoidBackward 140366527654208->140366527652672 140366527652336 TanhBackward 140366527654208->140366527652336 140366527653872 AddBackward0 140366527653872->140366527654208 140366527653824->140366527653872 140366527653056 TBackward 140366527653056->140366527653824 140366527652912 lstm.weight_hh_l1 (8, 2) 140366527652912->140366527653056 140366527654352 TBackward 140366527652912->140366527654352 140366527652432 TBackward 140366527652912->140366527652432 140366527652960 TBackward 140366527652912->140366527652960 140366526928880 UnbindBackward 140366526928880->140366526927824 140366526928880->140366527653296 140366526928880->140366527655840 140366526928880->140366527653872 140366527654160 AddBackward0 140366527654160->140366526928880 140366527655552 UnsafeViewBackward 140366527655552->140366527654160 140366527652768 MmBackward 140366527652768->140366527655552 140366527652576 ViewBackward 140366527652576->140366527652768 140366527652528->140366527652576 140366527652624 TBackward 140366527652624->140366527652768 140366527655312 lstm.weight_ih_l1 (8, 2) 140366527655312->140366527652624 140366527653104 lstm.bias_ih_l1 (8) 140366527653104->140366527654160 140366527654544 TanhBackward 140366527654544->140366527655600 140366527654112 AddBackward0 140366527654112->140366527654544 140366527655408 MulBackward0 140366527654112->140366527655408 140366527655744 MulBackward0 140366527655744->140366527654112 140366527652720->140366527655744 140366527653008 MulBackward0 140366527653008->140366527654112 140366527652672->140366527653008 140366527652336->140366527653008 140366527654352->140366527653632 140366527654496 TanhBackward 140366527654496->140366527652864 140366527654784 AddBackward0 140366527654784->140366527654496 140366527654592 MulBackward0 140366527654784->140366527654592 140366527655408->140366527654784 140366527655024->140366527655408 140366527654832 MulBackward0 140366527654832->140366527654784 140366527652816->140366527654832 140366527652144->140366527654832 140366527652432->140366527655072 140366527653968 TanhBackward 140366527653968->140366526928400 140366527655120 AddBackward0 140366527655120->140366527653968 140366526927632 MulBackward0 140366527655120->140366526927632 140366527654592->140366527655120 140366527653680->140366527654592 140366527654448 MulBackward0 140366527654448->140366527655120 140366527655888->140366527654448 140366527652288->140366527654448 140366527652960->140366526927344 140366526930032 TanhBackward 140366526930032->140366526929312 140366526929936 AddBackward0 140366526929936->140366526930032 140366526927632->140366526929936 140366527652000->140366526927632 140366527653920 MulBackward0 140366527653920->140366526929936 140366527655696->140366527653920 140366527653728->140366527653920 140366528677776 TBackward 140366528677776->140366528678496 140366526929360 fc1.weight (64, 2) 140366526929360->140366528677776 140366528677584 TBackward 140366528677584->140366528679024 140366528677920 fc2.weight (1, 64) 140366528677920->140366528677584
In [118]:
input_size = 1
hidden_size = 2
num_layers = 1

num_classes = 1

lstm = LSTM(num_classes, input_size, hidden_size, num_layers)

train_loss,test_loss,lstm_computation = training(300,lstm,[X_train,y_train],[X_val,y_val])
Epoch: 0, training loss: 0.35108 , validation loss: 0.13395
Epoch: 20, training loss: 0.06047 , validation loss: 0.00482
Epoch: 40, training loss: 0.04403 , validation loss: 0.01812
Epoch: 60, training loss: 0.02160 , validation loss: 0.01529
Epoch: 80, training loss: 0.00204 , validation loss: 0.00221
Epoch: 100, training loss: 0.00038 , validation loss: 0.00011
Epoch: 120, training loss: 0.00031 , validation loss: 0.00006
Epoch: 140, training loss: 0.00027 , validation loss: 0.00007
Epoch: 160, training loss: 0.00025 , validation loss: 0.00007
Epoch: 180, training loss: 0.00024 , validation loss: 0.00007
Epoch: 200, training loss: 0.00023 , validation loss: 0.00007
Epoch: 220, training loss: 0.00022 , validation loss: 0.00007
Epoch: 240, training loss: 0.00021 , validation loss: 0.00007
Epoch: 260, training loss: 0.00020 , validation loss: 0.00007
Epoch: 280, training loss: 0.00020 , validation loss: 0.00007
In [111]:
lstm_computation
Out[111]:
[10.252777999999978]
Above is the result of our lstm model, training and their training and validation loss. From the data, we can observe that the both training and validation were decreasing till 200 epoch and then the validation loss became constant. It took 10.75 miliseconds to train the model which is higher than our previous both model.
In [112]:
plot_TrainTestMSE(300,train_loss,test_loss)
In [113]:
data_real,data_predict = testPrediction(X_test,y_test,lstm)
lstm_mse = mean_squared_error(data_real,data_predict) 
lstm_mse
Out[113]:
0.27650952
In [159]:
plot_test(lstm,X_test)
Above is the line graph of the original data points and our predicited data points. Our linear model was able to perform well, since it was able to capture the trend of the data.
In [ ]:
 
In [ ]:
 

GRU

In [38]:
class GRU(nn.Module):
    def __init__(self, num_classes, input_size, hidden_size, num_layers):
        super(GRU, self).__init__()
        
        self.num_classes = num_classes
        self.num_layers = num_layers
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.seq_length = seq_length
        
        self.gru = nn.GRU(input_size=input_size, hidden_size=hidden_size,
                            num_layers=num_layers, batch_first=True)
        
        self.fc1 = nn.Linear(hidden_size, 32)
        self.fc2 = nn.Linear(32, num_classes)

    def forward(self, x):
        h_0 = Variable(torch.zeros(
            self.num_layers, x.size(0), self.hidden_size))
        
        # Propagate input through LSTM
        ula, h_out = self.gru(x, h_0)
        
        h_out = h_out.view(-1, self.hidden_size)
        
        out = self.fc1(h_out)
        out = self.fc2(out)
        
        return out
In [99]:
input_size = 1
hidden_size = 2
num_layers = 1

num_classes = 1

gru = GRU(num_classes, input_size, hidden_size, num_layers)

train_loss,test_loss,gru_computation = training(300,gru,[X_train,y_train],[X_val,y_val])
Epoch: 0, training loss: 0.58487 , validation loss: 0.35226
Epoch: 20, training loss: 0.04006 , validation loss: 0.00298
Epoch: 40, training loss: 0.00505 , validation loss: 0.00544
Epoch: 60, training loss: 0.00036 , validation loss: 0.00017
Epoch: 80, training loss: 0.00027 , validation loss: 0.00010
Epoch: 100, training loss: 0.00020 , validation loss: 0.00008
Epoch: 120, training loss: 0.00020 , validation loss: 0.00008
Epoch: 140, training loss: 0.00020 , validation loss: 0.00007
Epoch: 160, training loss: 0.00020 , validation loss: 0.00007
Epoch: 180, training loss: 0.00020 , validation loss: 0.00007
Epoch: 200, training loss: 0.00020 , validation loss: 0.00007
Epoch: 220, training loss: 0.00020 , validation loss: 0.00007
Epoch: 240, training loss: 0.00020 , validation loss: 0.00007
Epoch: 260, training loss: 0.00020 , validation loss: 0.00007
Epoch: 280, training loss: 0.00020 , validation loss: 0.00007
In [40]:
gru_computation
Out[40]:
[8.345468999999998]
Above is the result of our lstm model, training and their training and validation loss. From the data, we can observe that the both training and validation were decreasing till 200 epoch and then the validation loss became constant. It took 10.75 miliseconds to train the model which is higher than our previous both model.
In [41]:
plot_TrainTestMSE(300,train_loss,test_loss)
In [100]:
data_real,data_predict = testPrediction(X_test,y_test,gru)
gru_mse = mean_squared_error(data_real,data_predict) 
gru_mse
Out[100]:
0.24877092
In [160]:
plot_test(gru,X_test)
Above is the line graph of the original data points and our predicited data points. Our linear model was able to perform well, since it was able to capture the trend of the data.
In [ ]:
 

Conclusion

In [119]:
Y_computation = [pb_computations[0],bptt_computation[0],lstm_computation[0],gru_computation[0]]
Y_mse = [pb_mse,bptt_mse,lstm_mse,gru_mse]
X = ["PB","BPTT","LSTM","GRU"]
In [161]:
from plotly.subplots import make_subplots
import plotly.graph_objects as go

fig = make_subplots(rows=1, cols=2)

fig.add_trace(
    go.Scatter(x=X, y=Y_computation, name = "Time Computation (milliseconds)"),
    row=1, col=1,
   
)

fig.add_trace(
    go.Scatter(x=X, y=Y_mse,name="Test MSE"),
    row=1, col=2
)

fig.update_layout(height=400, width=1000, title_text="Computation time and Test MSE plot")
fig.show()
Above is an interative plot of computation time and test mse of all the models. The first plot show how the computation time varies w.r.t our 4 models. And the second plot shows how the test MSE varies w.r.t to our 4 models.

Plain Backpropagation has the best time computation since it is a simple model with very less amount of computation in comparison with lstm and gru. 

We got best test mse in gru model as 0.24. The test mse decreases from PB -> BPTT -> LSTM -> GRU.
In [ ]: